On the optimality of the Gittins index rule for multi-armed bandits with multiple plays
نویسندگان
چکیده
We investigate the general multi-armed bandit problem with multiple servers. We determine a condition on the reward processes su1⁄2cient to guarantee the optimality of the strategy that operates at each instant of time the projects with the highest Gittins indices. We call this strategy the Gittins index rule for multi-armed bandits with multiple plays, or brie ̄y the Gittins index rule. We show by examples that: (i) the aforementioned su1⁄2cient condition is not necessary for the optimality of the Gittins index rule; and (ii) when the su1⁄2cient condition is relaxed the Gittins index rule is not necessarily optimal. Finally, we present an application of the general results to the multiserver scheduling of parallel queues without arrivals.
منابع مشابه
A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements
We generalise classical multi-armed and restless bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch provided they do not consume more resource than is...
متن کاملRegret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Git...
متن کاملA Faster Index Algorithm and a Computational Study for Bandits with Switching Costs
We address the intractable multi-armed bandit problem with switching costs, for which Asawa and Teneketzis introduced in [M. Asawa and D. Teneketzis. 1996. Multi-armed bandits with switching penalties. IEEE Trans. Automat. Control, 41 328–348] an index that partially characterizes optimal policies, attaching to each project state a “continuation index” (its Gittins index) and a “switching index...
متن کاملA Note on Bandits with a Twist
A variant of the multi-armed bandit problem was recently introduced by Dimitriu, Tetali and Winkler. For this model (and a mild generalization) we propose faster algorithms to compute the Gittins index. The indexability of such models follows from earlier work of Nash on generalized bandits.
متن کاملComputing a Classic Index for Finite-Horizon Bandits
T paper considers the efficient exact computation of the counterpart of the Gittins index for a finitehorizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Math. Meth. of OR
دوره 50 شماره
صفحات -
تاریخ انتشار 1999